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Abstract. The LASSO is a variable subset selection procedure in sta- 
tistical linear regression based on £i penalization of the least-squares 
' operator. Its behavior crucially depends, both in practice and in theory, 

on the ratio between the fidelity term and the penalty term. We provide 
a detailed analysis of the fidelity vs. penalty ratio as a function of the 
' relaxation parameter. Our study is based on a general position con- 

^0 , dition on the design matrix which holds with probability one for most 

^ ■ experimental models. Along the way, the proofs of some well known 

-j-J ' basic properties of the LASSO are provided from this new generic point 

^ , of view. 

1. Introduction 

<^ \ 1.1. Problem statement and main results. The well-known standard 

CD ' Gaussian linear model in statistics reads y = Xj3 + z, where X denotes a 

£2 ■ n X p design matrix, /? G is an unknown parameter and the components 

of the error z are assumed i.i.d. with normal distribution M{0,a'^). Let 
us briefly recall some basic notations. For I C {1, • • • ,p}, \I\ denotes the 
i cardinal of /. For x € M^, we set xj = {xi)i^i € M)^^. The usual scalar 

product is denoted by (•,•). The notations for the norms on vectors and 
matrices are also standard: for any vector x = (xj) G M^, 



in 



I2 — ^ ^ ) ll^lll — ^ ^ ; ll^^lloo — sup l^j 

l<i<N l<i<N l<i<N 



X 

H 

For any matrix A, we denote by its transpose. For I C {1, . . . ,p}, and a 
matrix X, we denote by Xj the submatrix whose columns are indexed by /. 

The case where p is much larger than n has been the subject of an intense 
recent study. This problem is of course not solvable for any /3 but it has 
been discovered that if /3 is sufficiently sparse, then the solution of 

(1.1) j3x G argmin i||y-X6||2 + A||6||i, 

called the LASSO estimator of /3, is sometimes also sparse and close to /3. 
The acronym LASSO, due to [11], stands for Least Absolute Shrinkage and 
Selection Operator, and stems from the fact that the £i-norm penalty shrinks 
the components of the standard least-squares estimator /3. Some components 
are shrinked to the point of setting them to zero, hence implying automatic 
selection of the remaining nonzero components as good predictors for the 
experiments under study. We refer the interested reader to [5] and [9] for 
an overview of the relationships between sparsity and statistics, and spar- 
sity promoting penalizations of the least-squares criterion. Recent results 
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concerning the LASSO and extensions to other statistical models and penal- 
izations strategies may be found in [3], [1], [6] and [12] for instance. 

Under the assumption that the columns of X arc sufficiently "uncorre- 
lated", several authors were able to prove that, with high probability, the 
^2-norm of — (5) is of the same order of magnitude as the ^2-norm of 
X{P — /3) for an oracle /3. It may even perform as well as an "oracle". For 
instance, the oracle proposed in [6] is a solution of 

e argmin l-\\y - XxbrWl + X sgniPrYbT, 

beMP, bTc=0 ^ 

where T is the index set of the non-zero components of /3. The term "oracle" 
is often used to emphasize that the support of ^ is usually unknown ahead 
of time. Under stronger assumptions it was further proven in [4] and [6] 
that the support and sign pattern of /3 can be recovered exactly with high 
probability. A very efficient algorithm, based on Ncstcrov's method, for 
solving the LASSO estimation problem is described in [2]. 

A central quantity in the numerical analysis of the LASSO is the ratio 

^ ^ A||^,||i 

As is well known to both practitioners and theoreticians, severe problems 
occur when F is either very small or very large. The present paper provides a 
detailed analysis of F as a function of A. The proofs rely on a general position 
condition which holds with probability one for most random design matrix 
models. Along the way, we prove from this generic view point several results 
on the LASSO estimator which seem to belong to the folklore: uniqueness, 
continuity and piecewise affine parametrization as a function of A. Our main 
result states that there exists r > such that F is decreasing on (0, r] with 
F(t) = 0, and that ||y — X/3;^||2 is increasing on (0,r]. 

1.2. The General Position Condition. Our main assumption on the de- 
sign matrix X is the following. 

Assumption 1.1. (General Position Condition for X) For all supports S ^ 
S" C {1, . . . , n} and all {es, £5') € {-1, l}!*^! x {-1, l}!-^'' such that Xs and 
Xs' are non-singular, we have 

(1.2) esiXlXsr^es + e%,{X\,Xs'r^es' 

(1.3) es{X\Xs)-^{X'sXs'){X\,Xs')-^es' + \es{X\Xsr^es\. 

Since S S' , this property clearly holds with probability one if the entries 
of X are independent and have an absolutely continuous density with respect 
to the Lebesguc measure. This is a generic situation in statistics where the 
covariate measurements are usually corrupted by some noise. In the case of 
a more general type of design, we believe that this definition could easily be 
generalized so as to guarantee that (1.2) fails with probability at most of the 
order p~°' or is automatically satisfied for a carefully chosen deterministic 
design. A similar property, called Unicity Condition (UC) was proposed in 
[7] for the problem of finding the sparsest solution of a linear system with 
application to the field of compressed sensing. 
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1.3. Plan of the paper. Section 2 recalls the optimality conditions associ- 
ated to the LASSO. In Section 3, we study the standard LASSO estimator 
of ^ as a function of A. In particular, various continuity and monotonic- 
ity properties of some important functions of (5\ using the General Position 
Condition assumption only are established. Based on these results, we prove 
in Section 4 our main result Theorem 4.2. 

1.4. Additional notations. The set of symmetric real matrices is denoted 

by §„. For any matrix A in W^'^^'^'^ . we denote by the operator norm of 
A. The maximum (resp. minimum) singular value of A is denoted by (Tmax 
(resp. cJmin(A)). Recall that o'inax(^) = Mil and cTmin(^)^^ = ||^~^||- We 
use the Loewner ordering on symmetric real matrices: if ^ G S„. ^ yl is 
equivalent to saying that A is positive semi-definite, and A < B stands for 
{)<B-A. 

For any vector h G W, 6+ (resp. 6") denotes its non-negative (resp. 
non-positive) part, i.e. b = b'^ — b~ , with bj, bj > 0. 

For a given support 5" C {1, . . . , n}, we denote the range of Xs by Vs and 
the orthogonal projection onto Vs by Fvs- Recall that 

Pvs = Xs{XgXsy^Xi^. 

The support of Px is denoted by Tx. For the sake of notational simplicity, 
we write 

(1.4) % := (Px)^^. 

2. Optimality conditions 

In this section, we review the standard optimality conditions for the 
LASSO estimator. A necessary and sufficient optimality condition in (1.1) 
is that 

(2.5) 0G5(^||y-X^A||i + A||^A||i), 

where d denotes the sub-differential, which is equivalent to the existence of 
gx in d\\ • ||i at /3x such that 

(2.6) -X\y - XPx) + Xgx = 0. 

On the other hand, the sub-differential of || • ||i at f3x is defined by 

d\\ ■ \\i0x) = {7 e MP, 7^^ = sgn0f^) and h^.lU < l} . 

Thus, using the fact that y = XP+z, we may easily conclude that a necessary 
and sufficient condition for optimality in (1.1) is the existence of a vector gx, 

satisfying g^ = sgn(^^ ) and Hfffsdloo < 1; and such that 

j^x 

(2.7) X\y-XPx) = Xgx. 

The following corollary is a direct but important consequence of these 
previous preliminary remarks. 

Corollary 2.1. A necessary and sufficient condition for a given random 
vector h with support T to simultaneously satisfy the two following conditions: 

(1) b = Px, 
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(2) b has the same support T and sign pattern sgn(/3r) as /3 
is that 

(2.8) X^Tiy-Xb) = Asgn(;3T) 

(2.9) ||X^e(y-Xb)||oo < A. 

Proof. The fact that (2.8) and (2.9) arc necessary is a straightforward con- 
sequence of (2.7). Conversely, assume that (2.8) and (2.9) hold. Set 

(2.10) g = jX\y-Xh). 

Using (2.6), we deduce that g belongs to d\\ ■ ||i(b) and that the support of 
b is exactly the set T = {j G {1, . . . ,p}, \gj\ = 1}. On the other hand, we 
have that 

(2.11) g = sgn(/3r) 

(2.12) llglloo < 1, 

and we may deduce that g is at the same time in the sub-differential of any 
vector b in W with same support and sign pattern as p. Therefore, we have 

(2.13) T ={j e{l,...,p}, \gj\ = l} =T, 

and we conclude that P and b have the same support. Moreover, the index 
set r+ of the positive components of P and the index set T+ of the positive 
components of b satisfy 

(2.14) r+ ={je{i,...,p}, gj = i} =T+. 

The same argument implies that the index set T~ of the negative components 
of j3 equals the index set of the negative components of b. To sum up, /3 
and b have the same support and sign pattern and the proof is completed. 
This moreover implies that (2.8) and (2.9) are the optimality conditions for 
(1.1) and we obtain that b = ^ as announced. □ 



3. The LASSO estimator as a function of A 

This section establishes various continuity and monotonicity properties 
of some important functions of /3x using the General Position Condition 
assumption only. 

The following notations will be useful. Define C as the cost function: 

r m:^ X — > R+ 
^^■'^^ ^ 1 (A,6) ^ l\\y-xb\\l + x\\b\u, 



and for all A > 0, 



(3.16) ^(A) = M j(:(X,b). 

beMP 
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3.1. More on the estimator Px. We begin with the fohowing useful char- 
acterization of the LASSO estimators. For any w G W, let us introduce 

(3.17) Viw) = argmin 

fteMP, Xb=Xw 

Lemma 3.1. A vector fix is a solution of (1.1) if and only if Px € V{fix)- 
Proof. Let fix be a solution of (1.1). Let fix £ V{fix)- Then, we have 

(3.18) II^aIIi < II^aIIi. 

On the other hand, the definition of fix implies that 

(3.19) l\\y-X~Px\\l + Mmi > ^lly-^^Alli + AII^Alli. 
Moreover, since XPx = ^Px: we have that 

(3.20) ^\\y-x^x\\l = Ih-XPxWl 
and subtracting this equality to (3.19), wc obtain that 

II^aIIi < II/3a||i, 

which, combined with (3.18), implies that 

(3.21) ||;9a||i = II^aIIi. 

This last equality together with (3.20) implies the desired result. □ 

We now give a useful expression of Px in terms of A and the submatrix of 
X indexed by f . 

Lemma 3.2. For any A > such that fix 7^ 0, the matrix X^^ is non- 
singular and we have 

(3.22) % = (XlX^J-i(xty-Asgn(^^J). 

Proof. Recall that the optimality conditions for the LASSO imply that 

(3.23) Xl(y-X^/^J = Asgn(^^J. 

Since X^;^ is non-singular, we obtain (3.22). □ 

The following Theorem establishes the existence and unicity of fix with 
support size less than or equal to n. In the sequel, Px will always refer to 
this solution. 

Theorem 3.3. Assume that Assumption 1.1 holds. Then, almost surely, for 
any A > 0, the minimization problem (1.1) has a unique solution fix with 
support Tx C {1, . . . ,p} verifying 

(3.24) \fx\ < n. 

Proof. We first study the support of a possible solution Px- Second, we 
derive (3.22), and eventually, we prove the uniqueness of fix under the general 
position condition. 



6 



STEPHANE CHRETIEN AND SEBASTIEN DARSES 



Study of #T. Recall that (resp. 6 ) be the non-negative (resp. non- 
positive) part of h, i.e. 6 = 6+ — with 6j > 0. Then, Lemma 3.1 

above equivalently says that P\ is a solution of (LI) if and only if and 

are solutions of 

p 

(3.25) min V 1 6+ + 67) s.t. X6+ - X^" = X^Sa- 

The remainder of the proof relies on linear programming theory and Assump- 
tion 1.1. Notice first that the solution set is compact due to the coercivity 
of the £i-norm. Thus, the theory of linear programming [10] ensures that 
each extreme point of the solution set of (3.25) is completely determined by 
a "basis" B. In the present setting, for an extreme point h* = b*'^ — b*~ 
of the solution set of (3.25), the associated basis B* can be written (in a 
non-unique way) as B* = B*^ U B*^ , \B*\ = n, and is such that 

(i) the square matrix , —X^*-] is non singular, 

(ii) b*B*^ = and 

(iii) the couple (6*^.+ , 6*g._) is uniquely determined by the system 

(3.26) — = XPx, 

(or equivalently, = Xf3x). 

An immediate consequence is that the support of b* has cardinal at most n. 
Moreover, b* G V{b*), and using Lemma 3.1, we deduce that b* is a solution 
of (1.1). Therefore, we may assume without loss of generality that I5\ is an 
extreme point of V{P\), with 

and that X^^ is non-singular. 

Uniqueness of I3\: first part. — We give two equations satisfied by A and z 
in the case where uniqueness of the LASSO estimator fails. 

Let in W be another solution of (1.1). Using the same reasonning as for 
f3\ in the end of the last paragraph, we may assume w.l.o.g. that the support 
Tj^ of has cardinal at most n and that Xf, is non-singular. Convexity of 
the LASSO functional implies that the map 

/ [0,1] 

^ '■ \ t ^ c{\,{tA + {i-t)d'x)) 

is constant. ^ ^ ^ 

Notice that the term 11,0^+* (P\ — I^'x)\\i is in fact piecewise affine on (0, t). 
Set 

P\ = sgn(^^J 
p'a = sgn(^^,). 

Now, let > sufficiently small such that for all t G (0, t*) the support of 
P'x + i {Px — P'x) is constant and equal to Tx U and no sign change occurs. 



(3.27) 
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Set 

(3.28) p = + 
Thus, for all t e iO,t*), 

with 

Pf^ = Px and pf^ = p'^ 

and we deduce that is a second order polynomial in the variable t G (0, t*). 
Therefore, the coefficients corresponding to the quadratic and linear terms 
of (f) must be zero. Developing the term ^\\y — X{t f5\ + {1 — t) /?^)||2, we 
then obtain: 

Xf^P^-X^,^P'^ = 

y\X^^A-Xf,P'^) + \p\Px-dx) = 0, 

which is equivalent to 

(3.29) Xf^Px-Xf,fi = 

(3.30) P\A-Px) = 0. 
Uniqueness of j3\: second part. — As for we write 

(3.31) = {Xi^,Xf,)-Uxty-Xsgn0f,)). 
Replacing (3.22) and (3.31) into (3.29), we obtain 

(3.32) (P^^ - P^, ) y - X (x^JX|^X^J-Va - X^,{X'^,Xf,)-'p'^) = 0. 
On the other hand, (3.30) gives 

(3.33) = y'[Xf^iXi,^Xf^r'px-Xf,iXi,^Xf,)-'p'^) 

-A (pi(4^x^j-VA - ip'x)\xi,,Xf,r'p'x) . 

Setting 

Vx = (X|^ Xf^ y^Px- Xf^ (X|;, X^, ) " Va 
Ca = pUxI X^J-Va - (PA)*(4;^n)"VA, 
we obtain the system: 

(3.34) {Pf^-Pf,Jy-Xr,x = 

(3.35) y'vx-XCx = 0. 
Notice that 

(Pf^ - Py/ , r)x, Ca) e J^l X X J^3, 
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where 

J-i = {Ps-P5', S^S' C{l,...,n}} 

J2 = {Xs{X'sXs)-'es-Xs'{Xl.,Xs')-^es', iS,S',es,es')€G} 
= {e'siX'sXs)-'es-e's'iX's,Xs')-'es^ {S,S',es,es')^G}, 

with 

g = {s^S'c{l,...,n}, ies,es')e{-l,l}\'^\x{-l,l}\^'\]. 
Therefore, {y, A) is a solution of the finite set of equations 

(3.36) Q y-Xv = 

(3.37) y'ri-XC = 0, 

when {Q,r),() is running over J^i x x -^3- This implies that 
{(^a,a), A>o} C U Ej, 

where JT" is a finite set and the Ej C M"+^ are linear subspaces. 

Let us now show that there is no Ej, j G J^, containing a subspacc of 
dimension n. Let us suppose that this is not the case, i.e. there exist two 
supports S S' and (77, C) G -^2 x -^3 such that for all y G MP, 

(3.38) {Ps-Ps')y = ^y. 

When the rank of P5 — P5' is different from 1, (3.38) cannot be satisfied 
for all y G M"". Thus, we only have to focus on the case where the rank of 
P5 ~ P5' is 1, or equivalently, I^A^'j = 1. We distinguish two cases. Either 
Ws ■- n Vs' ^ {0} or Ws = {0}: 

(i) If Ws / {0}, take v £Ws,vj^O. Then (P5 - Ps')v = -v, and the 

only eigenvalue of P5 — P^/ is —1. 
(fi) If Ws = {0}, then V5/ C Vs and so Ws' := nVs ^ {0}. Hence, 
take a non-zero v G We now have (P5 — P5')v = v, and the 

only eigenvalue of P5 — P5/ is 1. 

But the only eigenvalue of rj^f /C, is ||?7||2/C- By developing 

Ml = \\Xs{XlXs)-^es-X's>{X's>Xs')-hs'f 
and comparing with 

C = ^s{XsXs)~'^£s - £^S'(^S'^S')~'^£S'-, 
we can write that the General Position Condition, Assumption 1.1, is equiv- 
alent to the following inequations: 

C / 
Ml + IC|. 

Therefore, the operators P5 — P5/ and ryryVC are different. Hence, (3.38) is 
not satisfied for all y G M" when the rank of P5 — P5/ is 1. 



As a conclusion, the dimension of Ej is less than n + 1. the probability 
that there exists A > such that uniqueness of the LASSO estimator fails, 
is equal to zero. □ 
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3.1.1. Continuity. Proposition 3.5 below addresses the continuity of /3a- We 
start with a prehminary lemma. 

Lemma 3.4. Let Assumption 1.1 hold. Then, the function 9 is concave and 
non- decreasing. 

Proof. Since 9 is the infimum of a set of affine functions of the variable A, it 
is concave. Moreover, we have 

^(A) = £(aJa), 

where, by Lemma 3.3, P is the unique solution of (1.1). Using the filling 
property [8, Chapter XII], we obtain that d9{\) is the singleton {||,5a||i}- 
Thus, 9 is differentiable and its derivative at A is given by 

(3.39) 9\\) = WhWi. 

Moreover, this last expression shows that 9 is nondecreasing. □ 
Lemma 3.5. Let Assumption 1.1 hold. Then, almost surely, the map 




is hounded and continuous. Moreover, its ii-norm is non-increasing. 

Proof. We naturally divide the proof into three parts: 

(i) ||/3a||i *5 non-increasing - The fact that A i — > ||/?a||i is non-increasing 
is an immediate consequence of the concavity of 9. 

(ii) Boundedness - Notice that using (3.22), we obtain that 

II^aIIi < ^ui^\\{X'sXs)-\xy-X5)\l. 

Thus, A I — > f3x is bounded on any interval of the form (0, M], with 
M G (0, +oo). Moreover, since its ^i-norm is non-increasing, it is 
bounded on (0, oo). 

(iii) Continuity - Assume for contradiction that A i — > f3x is not contin- 
uous at some A° > 0. Using boundedness, we can construct two se- 
quences converging towards f3^o and /3^o respectively with /3^o ^ (3^o ■ 
Since £(A°, •) is continuous, both limits are optimal solutions of the 
problem 

(3.40) argmin /:(A°,6), 

hence contradicting the uniqueness proven in Lemma 3.3 above. 

□ 

4. The fidelity and penalty terms as functions of A 
Our main goal in this section is to study the function 
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This function is important in order to study the numerical aspects of the 
LASSO estimator. Indeed if the fidehty term is very small compared to the 
penalty term or vice versa, the resulting optimization problem might be very 
badly conditioned and the resulting estimator may turn to be useless in prat- 
ice. We will prove in particular the very intuitive fact that F is continuous, 
tends to +00 when A tends to zero and is decreasing for A sufficiently large. 
Let us first begin with the following elementary result. 

Lemma 4.1. (Nontriviality of the estimator) Let S be the set 

(4.42) 

S = {(S,5); 5 C {!,..., p}, 5€{-l,l}l^l, \S\<n, a^i,,{Xs) > o] . 
The inequality 

(4.43) M \\{X'sXs)-Hxy-X6)\\^ > 
holds with probability one. 

Proof. This is an immediate consequence of the Gaussian distribution of 
z. □ 

Theorem 4.2. Let Assumption 1.1 hold. Then, the function V defined by 
(4-41) almost surely satisfies 

(4.44) lim r(A) = +00. 

Moreover, almost surely, there exists r > such that F is decreasing on the 
interval (0,r] with r(r) = 0, while \\y — X/3;^||2 is increasing on (0, r]. 

Proof. We will use repeatedly Lemmas 3.4 and 3.5. The proof is divided into 
four steps. 

Step 1. lim;^4,o r(A) = +00. We divide this proof into two parts. 

Step l.a. We first show that |rx| = n for A sufficiently small. Let {Xk)ken 
be any positive sequence converging to 0. Let f3* be any cluster point of the 
sequence (/3Afe)fcGN (recall that this sequence is bounded thanks to Lemma 
3.5). Fix e > and b G M^. For all /c G N, we have 

(4.45) CihJx,) < jC{Xk,b). 

Since C{Xk, •) is continuous, we can also write for k sufficiently large: 

Hence, 

C{Xk,n < jC{Xk,b)+e. 
Letting A^ 0, we obtain 



l\\y-xnl < l\\y-Xb\\l + e, 



and thus, 



1 

2"" ~ beRpT 



(4.46) -\\y-XI3*g < inj -\\y - Xbg. 
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Since range (X) = M", (4.46) implies 



= 0, 



and then 



(4.47) lim||y-X^A||i = 0. 

Notice further that {b G W, |supp(fe)| < n} is a finite union of subspaces of 
W, each with dimension n — 1. Thus, 

(4.48) m:= inf hlv - Xb\\l > 0, 

{beKf; |supp(6)|<n} 2" 

with probabiUty one. Therefore for A sufficiently small, (4.47) implies 

(4.49) \\y-XA\\l < rn, 
from which we deduce that \Tx\ = n. 

Step l.h. Let Ao > be sufficiently small so that for all A < Aq, |Ta| = n. 
Such a Ao exists due to Step l.a. Hence, since Xf^ is nonsingular: 

(4.50) ^Vt^ = Id„. 
Thus, using (3.22), we obtain 

(4.51) y-XPx = -AX^^(XlX^J-isgnp^J, 
which implies that 

(4.52) Wy-XAWl = \'\\{Xt^XfJ-hgn{Pf^)\\l 
Moreover, Lemma 4.1 combined with (3.22) gives 

(4.53) ||^;,||i >mi^s,5)ei:\\mXsrH^y->^S)\\^> 0. 
Hence, for A < Aq, 

Using the trivial fact that 

(4.55) sup \\Xs{X'sXs)-'6\\l < oo, 

(5,5) es 

the proof is complete. 



2 • 



Step 2. Partitioning (0, +oo) into good intervals. 

The continuity result of Lemma 3.5 implies that the interval (0, +oo) can 
be partitioned into subintervals of the type Ik = (Ajt, Afe+i], with 

(i) Ao = and A^ e M!;. U {+00} for A; > 0, 

(ii) the support and sign pattern of Px are constant on each I^. 



tain 
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Notice further that due to Step l.a, T\ ^ on at least Iq. Let /C be the 
nonempty set 

(4.56) /C = {A: G N, VA G 4, ^ o} . 

On any interval 1^, k G IC, Lemma 3.3 states that the expression (3.22) for 
Pf^ holds. Multiplying (3.22) on the left by sgn (^Pfx) ' ^® ^^^^ 

(4.57) II^aIIi = sgn(^^J*(XlX^J-iX^y 

(4.58) -Asgn (^^J* (Xl X^J'^sgn (^^J . 
Thus 

^(A) = -sgn(/3^J*(4X^J-sgn(^^J, 
on Ml. Thus, the definition of E, we obtain that 



c 

on each 1^, k E JC and 



(4.59) 3Ml^x) < - inf S'Ml6 < 

dX (5,<5)gS 



(4.60) ^^(A) = 

on each Ik, k ^ IC, i.e. on each Ik such that WPf^ ||i = for all A in Ik, if any 
such Ik exists. Since A i — y is continuous on (4.59) implies that 

(i) there exists r in M^, such that = (as an easy consequence of 
the Fundamental Theorem of Calculus and a contradiction). 

(ii) Px = for all A > r. 

Hence Lik^K^k is a connected bounded interval. 

Step 3. \\y — Xj3\\\2 is increasing on (0, r]. 

Using (4.52), we immediately see that the derivative of ||?/ — -^^/^aIII is 
nothing but 



dX dX 

(4.62) = 2A||(XlX^J-isgn(%)||i. 

Therefore, 

(4.63) d\\y-XMl > 2X n a,,^ {{Xtx^yY > 0, 
which proves the desired result by using continuity of ||y — X^a||2 at r. 
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Step 4. r is decreasing on (0,t). 
Let us study the function 
(4.64) $ : 



A ^ A||/3a||i. 

We immediately deduce from Step 2 and the definition of the intervals 1^, 
k e K., that $ is differentiable on each 1^, A; G /C, and using (3.22), its 
derivative on 1^ reads 

^(A) = i|%||i-Asgn(%)*(XlX^J-isgn(^^J 

(4.65) = II^^JIi - A WiXtXfJ-'/h^ (%) Hi. 
Now, since X^^ is non singular, 

(4.66) \\y-XA\\l = A2||(4^X^J-isgn(%)||i 

(4.67) > A2nc7^in((X|^X^J-i)' >0 

for A > 0. Therefore r(A) < +oo on M^, T is continuous on Jj^ and differen- 
tiable on Ik- Moreover, using (4.52), we have 

dr ^ f (A)||y-X^,||i-cl>(A) '^ll^-,f^ll^ (A) 



dX 



(A) - 2- 



Hence, using (4.65) and (4.52), 



< 



T.r'/'sgn (df, 112 



-A ||(XlX^)-V2sgn 



A2||(XlX^J-isgn (^/3^^ 



1 

< 

A 



(^i.x 



on each I^. We can thus conclude, by the non-singularity of Xf^ that T is 
decreasing on (0,t), as announced. □ 
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